Exploring the normality of numerical columns in a novel data set and producing publication quality tables and reports
We first need to prepare our environment with the necessary packages
options(repos = list(CRAN = "http://cran.rstudio.com/"))
options(digits = 2)
install.packages("pacman")
pacman::p_load(dlookr, # Exploratory data analysis
formattable, # HTML tables from R outputs
kableExtra, # Alternative to formattable
knitr, # Needed to write HTML reports
nycflights13, # Holds the flights data set
tidyverse) # Powerful data wrangling package suite
# Let's load a data set from the flights data set
data("flights")
# What does the data look like?
formattable(head(flights))
| year | month | day | dep_time | sched_dep_time | dep_delay | arr_time | sched_arr_time | arr_delay | carrier | flight | tailnum | origin | dest | air_time | distance | hour | minute | time_hour |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2013 | 1 | 1 | 517 | 515 | 2 | 830 | 819 | 11 | UA | 1545 | N14228 | EWR | IAH | 227 | 1400 | 5 | 15 | 2013-01-01 05:00:00 |
| 2013 | 1 | 1 | 533 | 529 | 4 | 850 | 830 | 20 | UA | 1714 | N24211 | LGA | IAH | 227 | 1416 | 5 | 29 | 2013-01-01 05:00:00 |
| 2013 | 1 | 1 | 542 | 540 | 2 | 923 | 850 | 33 | AA | 1141 | N619AA | JFK | MIA | 160 | 1089 | 5 | 40 | 2013-01-01 05:00:00 |
| 2013 | 1 | 1 | 544 | 545 | -1 | 1004 | 1022 | -18 | B6 | 725 | N804JB | JFK | BQN | 183 | 1576 | 5 | 45 | 2013-01-01 05:00:00 |
| 2013 | 1 | 1 | 554 | 600 | -6 | 812 | 837 | -25 | DL | 461 | N668DN | LGA | ATL | 116 | 762 | 6 | 0 | 2013-01-01 06:00:00 |
| 2013 | 1 | 1 | 554 | 558 | -4 | 740 | 728 | 12 | UA | 1696 | N39463 | EWR | ORD | 150 | 719 | 5 | 58 | 2013-01-01 05:00:00 |
# What are the properties of the data
formattable(diagnose(flights))
| variables | types | missing_count | missing_percent | unique_count | unique_rate |
|---|---|---|---|---|---|
| year | integer | 0 | 0.00 | 1 | 3.0e-06 |
| month | integer | 0 | 0.00 | 12 | 3.6e-05 |
| day | integer | 0 | 0.00 | 31 | 9.2e-05 |
| dep_time | integer | 8255 | 2.45 | 1319 | 3.9e-03 |
| sched_dep_time | integer | 0 | 0.00 | 1021 | 3.0e-03 |
| dep_delay | numeric | 8255 | 2.45 | 528 | 1.6e-03 |
| arr_time | integer | 8713 | 2.59 | 1412 | 4.2e-03 |
| sched_arr_time | integer | 0 | 0.00 | 1163 | 3.5e-03 |
| arr_delay | numeric | 9430 | 2.80 | 578 | 1.7e-03 |
| carrier | character | 0 | 0.00 | 16 | 4.8e-05 |
| flight | integer | 0 | 0.00 | 3844 | 1.1e-02 |
| tailnum | character | 2512 | 0.75 | 4044 | 1.2e-02 |
| origin | character | 0 | 0.00 | 3 | 8.9e-06 |
| dest | character | 0 | 0.00 | 105 | 3.1e-04 |
| air_time | numeric | 9430 | 2.80 | 510 | 1.5e-03 |
| distance | numeric | 0 | 0.00 | 214 | 6.4e-04 |
| hour | numeric | 0 | 0.00 | 20 | 5.9e-05 |
| minute | numeric | 0 | 0.00 | 60 | 1.8e-04 |
| time_hour | POSIXct | 0 | 0.00 | 6936 | 2.1e-02 |
variables: name of each variabletypes: data type of each variablemissing_count: number of missing valuesmissing_percent: percentage of missing valuesunique_count: number of unique valuesunique_rate: rate of unique value - unique_count / number of observations (c) Cédric Scherer
(c) Andrey Akinshin
NOTE
(c) Andrey Akinshin
NOTE
# Summary statistics
formattable(describe(flights))
| variable | n | na | mean | sd | se_mean | IQR | skewness | kurtosis | p00 | p01 | p05 | p10 | p20 | p25 | p30 | p40 | p50 | p60 | p70 | p75 | p80 | p90 | p95 | p99 | p100 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| year | 336776 | 0 | 2013.0 | 0.0 | 0.0000 | 0 | NaN | NaN | 2013 | 2013 | 2013 | 2013 | 2013 | 2013 | 2013 | 2013 | 2013 | 2013 | 2013 | 2013 | 2013 | 2013 | 2013 | 2013 | 2013 |
| month | 336776 | 0 | 6.5 | 3.4 | 0.0059 | 6 | -0.01340 | -1.19 | 1 | 1 | 1 | 2 | 3 | 4 | 4 | 5 | 7 | 8 | 9 | 10 | 10 | 11 | 12 | 12 | 12 |
| day | 336776 | 0 | 15.7 | 8.8 | 0.0151 | 15 | 0.00774 | -1.19 | 1 | 1 | 2 | 4 | 7 | 8 | 10 | 13 | 16 | 19 | 22 | 23 | 25 | 28 | 29 | 31 | 31 |
| dep_time | 328521 | 8255 | 1349.1 | 488.3 | 0.8519 | 837 | -0.02474 | -1.09 | 1 | 551 | 624 | 703 | 827 | 907 | 1001 | 1200 | 1401 | 1536 | 1700 | 1744 | 1830 | 2008 | 2112 | 2251 | 2400 |
| sched_dep_time | 336776 | 0 | 1344.3 | 467.3 | 0.8053 | 823 | -0.00586 | -1.20 | 106 | 600 | 630 | 705 | 830 | 906 | 1000 | 1200 | 1359 | 1529 | 1655 | 1729 | 1815 | 1945 | 2050 | 2225 | 2359 |
| dep_delay | 328521 | 8255 | 12.6 | 40.2 | 0.0702 | 16 | 4.80254 | 43.95 | -43 | -12 | -9 | -7 | -6 | -5 | -4 | -3 | -2 | 0 | 6 | 11 | 18 | 49 | 88 | 191 | 1301 |
| arr_time | 328063 | 8713 | 1502.1 | 533.3 | 0.9310 | 836 | -0.46782 | -0.19 | 1 | 22 | 736 | 853 | 1023 | 1104 | 1154 | 1338 | 1535 | 1725 | 1854 | 1940 | 2026 | 2159 | 2248 | 2345 | 2400 |
| sched_arr_time | 336776 | 0 | 1536.4 | 497.5 | 0.8572 | 821 | -0.35314 | -0.38 | 1 | 40 | 815 | 917 | 1040 | 1124 | 1214 | 1400 | 1556 | 1738 | 1900 | 1945 | 2030 | 2200 | 2246 | 2353 | 2359 |
| arr_delay | 327346 | 9430 | 6.9 | 44.6 | 0.0780 | 31 | 3.71682 | 29.23 | -86 | -44 | -32 | -26 | -19 | -17 | -14 | -10 | -5 | 1 | 9 | 14 | 21 | 52 | 91 | 190 | 1272 |
| flight | 336776 | 0 | 1971.9 | 1632.5 | 2.8130 | 2912 | 0.66160 | -0.85 | 1 | 11 | 91 | 209 | 421 | 553 | 706 | 1115 | 1496 | 1903 | 2889 | 3465 | 3830 | 4471 | 4695 | 5736 | 8500 |
| air_time | 327346 | 9430 | 150.7 | 93.7 | 0.1638 | 110 | 1.07071 | 0.86 | 20 | 33 | 40 | 47 | 71 | 82 | 93 | 112 | 129 | 146 | 167 | 192 | 214 | 319 | 339 | 364 | 695 |
| distance | 336776 | 0 | 1039.9 | 733.2 | 1.2635 | 887 | 1.12869 | 1.19 | 17 | 169 | 199 | 214 | 427 | 502 | 544 | 733 | 872 | 1023 | 1096 | 1389 | 1598 | 2446 | 2475 | 2586 | 4983 |
| hour | 336776 | 0 | 13.2 | 4.7 | 0.0080 | 8 | -0.00054 | -1.21 | 1 | 6 | 6 | 7 | 8 | 9 | 10 | 12 | 13 | 15 | 16 | 17 | 18 | 19 | 20 | 22 | 23 |
| minute | 336776 | 0 | 26.2 | 19.3 | 0.0333 | 36 | 0.09293 | -1.24 | 0 | 0 | 0 | 0 | 5 | 8 | 11 | 20 | 29 | 30 | 40 | 44 | 45 | 55 | 58 | 59 | 59 |
n : number of observations excluding missing valuesna : number of missing valuesmean : arithmetic averagesd : standard deviationse_mean : standard error mean. sd/sqrt(n)IQR : interquartile range (Q3-Q1)skewness : skewnesskurtosis : kurtosisp25 : Q1. 25% percentilep50 : median. 50% percentilep75 : Q3. 75% percentilep01, p05, p10, p20, p30 : 1%, 5%, 20%, 30% percentilesp40, p60, p70, p80 : 40%, 60%, 70%, 80% percentilesp90, p95, p99, p100 : 90%, 95%, 99%, 100% percentilesThe above is pretty overwhelming, and most people don’t care about percentiles outside of Q1, Q3, and the median (Q2).
# Summary statistics, selecting the desired ones
RefinedTable <- flights %>%
describe() %>%
select(variable, n, na, mean, sd, se_mean, IQR, skewness, kurtosis, p25, p50, p75)
formattable(RefinedTable)
| variable | n | na | mean | sd | se_mean | IQR | skewness | kurtosis | p25 | p50 | p75 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| year | 336776 | 0 | 2013.0 | 0.0 | 0.0000 | 0 | NaN | NaN | 2013 | 2013 | 2013 |
| month | 336776 | 0 | 6.5 | 3.4 | 0.0059 | 6 | -0.01340 | -1.19 | 4 | 7 | 10 |
| day | 336776 | 0 | 15.7 | 8.8 | 0.0151 | 15 | 0.00774 | -1.19 | 8 | 16 | 23 |
| dep_time | 328521 | 8255 | 1349.1 | 488.3 | 0.8519 | 837 | -0.02474 | -1.09 | 907 | 1401 | 1744 |
| sched_dep_time | 336776 | 0 | 1344.3 | 467.3 | 0.8053 | 823 | -0.00586 | -1.20 | 906 | 1359 | 1729 |
| dep_delay | 328521 | 8255 | 12.6 | 40.2 | 0.0702 | 16 | 4.80254 | 43.95 | -5 | -2 | 11 |
| arr_time | 328063 | 8713 | 1502.1 | 533.3 | 0.9310 | 836 | -0.46782 | -0.19 | 1104 | 1535 | 1940 |
| sched_arr_time | 336776 | 0 | 1536.4 | 497.5 | 0.8572 | 821 | -0.35314 | -0.38 | 1124 | 1556 | 1945 |
| arr_delay | 327346 | 9430 | 6.9 | 44.6 | 0.0780 | 31 | 3.71682 | 29.23 | -17 | -5 | 14 |
| flight | 336776 | 0 | 1971.9 | 1632.5 | 2.8130 | 2912 | 0.66160 | -0.85 | 553 | 1496 | 3465 |
| air_time | 327346 | 9430 | 150.7 | 93.7 | 0.1638 | 110 | 1.07071 | 0.86 | 82 | 129 | 192 |
| distance | 336776 | 0 | 1039.9 | 733.2 | 1.2635 | 887 | 1.12869 | 1.19 | 502 | 872 | 1389 |
| hour | 336776 | 0 | 13.2 | 4.7 | 0.0080 | 8 | -0.00054 | -1.21 | 9 | 13 | 17 |
| minute | 336776 | 0 | 26.2 | 19.3 | 0.0333 | 36 | 0.09293 | -1.24 | 8 | 29 | 44 |
formattable(diagnose_category(flights))
| variables | levels | N | freq | ratio | rank |
|---|---|---|---|---|---|
| carrier | UA | 336776 | 58665 | 17.420 | 1 |
| carrier | B6 | 336776 | 54635 | 16.223 | 2 |
| carrier | EV | 336776 | 54173 | 16.086 | 3 |
| carrier | DL | 336776 | 48110 | 14.285 | 4 |
| carrier | AA | 336776 | 32729 | 9.718 | 5 |
| carrier | MQ | 336776 | 26397 | 7.838 | 6 |
| carrier | US | 336776 | 20536 | 6.098 | 7 |
| carrier | 9E | 336776 | 18460 | 5.481 | 8 |
| carrier | WN | 336776 | 12275 | 3.645 | 9 |
| carrier | VX | 336776 | 5162 | 1.533 | 10 |
| tailnum | NA | 336776 | 2512 | 0.746 | 1 |
| tailnum | N725MQ | 336776 | 575 | 0.171 | 2 |
| tailnum | N722MQ | 336776 | 513 | 0.152 | 3 |
| tailnum | N723MQ | 336776 | 507 | 0.151 | 4 |
| tailnum | N711MQ | 336776 | 486 | 0.144 | 5 |
| tailnum | N713MQ | 336776 | 483 | 0.143 | 6 |
| tailnum | N258JB | 336776 | 427 | 0.127 | 7 |
| tailnum | N298JB | 336776 | 407 | 0.121 | 8 |
| tailnum | N353JB | 336776 | 404 | 0.120 | 9 |
| tailnum | N351JB | 336776 | 402 | 0.119 | 10 |
| origin | EWR | 336776 | 120835 | 35.880 | 1 |
| origin | JFK | 336776 | 111279 | 33.042 | 2 |
| origin | LGA | 336776 | 104662 | 31.078 | 3 |
| dest | ORD | 336776 | 17283 | 5.132 | 1 |
| dest | ATL | 336776 | 17215 | 5.112 | 2 |
| dest | LAX | 336776 | 16174 | 4.803 | 3 |
| dest | BOS | 336776 | 15508 | 4.605 | 4 |
| dest | MCO | 336776 | 14082 | 4.181 | 5 |
| dest | CLT | 336776 | 14064 | 4.176 | 6 |
| dest | SFO | 336776 | 13331 | 3.958 | 7 |
| dest | FLL | 336776 | 12055 | 3.580 | 8 |
| dest | MIA | 336776 | 11728 | 3.482 | 9 |
| dest | DCA | 336776 | 9705 | 2.882 | 10 |
| time_hour | 2013-09-13 08:00:00 | 336776 | 94 | 0.028 | 1 |
| time_hour | 2013-09-20 08:00:00 | 336776 | 94 | 0.028 | 1 |
| time_hour | 2013-09-09 08:00:00 | 336776 | 93 | 0.028 | 3 |
| time_hour | 2013-09-16 08:00:00 | 336776 | 93 | 0.028 | 3 |
| time_hour | 2013-09-23 08:00:00 | 336776 | 93 | 0.028 | 3 |
| time_hour | 2013-09-19 08:00:00 | 336776 | 92 | 0.027 | 6 |
| time_hour | 2013-10-11 08:00:00 | 336776 | 92 | 0.027 | 6 |
| time_hour | 2013-09-10 08:00:00 | 336776 | 91 | 0.027 | 8 |
| time_hour | 2013-09-12 08:00:00 | 336776 | 91 | 0.027 | 8 |
| time_hour | 2013-09-17 08:00:00 | 336776 | 91 | 0.027 | 8 |
variables: category nameslevels: group names within categoriesN: number of observationfreq: number of observation at group level / number of observation at category levelratio: percentage of observation at group level / number of observation at category levelrank: rank of the occupancy ratio of levels (order in which the groups are in the category)GroupTable <- flights %>%
group_by(carrier) %>%
select(carrier, dep_delay, arr_delay, air_time) %>%
describe() %>%
select(variable, carrier, n, na, mean, sd, se_mean, IQR, skewness, kurtosis, p25, p50, p75)
formattable(GroupTable)
| variable | carrier | n | na | mean | sd | se_mean | IQR | skewness | kurtosis | p25 | p50 | p75 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| air_time | 9E | 17294 | 1166 | 86.78 | 43 | 0.33 | 61 | 0.763 | 0.083 | 51 | 83.0 | 112.0 |
| air_time | AA | 31947 | 782 | 188.82 | 82 | 0.46 | 81 | 0.657 | -0.285 | 134 | 169.0 | 215.0 |
| air_time | AS | 709 | 5 | 325.62 | 16 | 0.61 | 22 | 0.448 | 0.429 | 314 | 324.0 | 336.0 |
| air_time | B6 | 54049 | 586 | 151.18 | 90 | 0.39 | 120 | 0.776 | -0.091 | 67 | 142.0 | 187.0 |
| air_time | DL | 47658 | 452 | 173.69 | 85 | 0.39 | 101 | 0.882 | -0.407 | 115 | 145.0 | 216.0 |
| air_time | EV | 51108 | 3065 | 90.08 | 40 | 0.18 | 63 | 0.533 | -0.410 | 53 | 87.0 | 116.0 |
| air_time | F9 | 681 | 4 | 229.60 | 15 | 0.58 | 21 | 0.323 | -0.244 | 218 | 229.0 | 239.0 |
| air_time | FL | 3175 | 85 | 101.14 | 24 | 0.42 | 46 | -0.674 | -0.892 | 72 | 109.0 | 118.0 |
| air_time | HA | 342 | 0 | 623.09 | 21 | 1.12 | 27 | 0.441 | 0.258 | 608 | 621.5 | 635.0 |
| air_time | MQ | 25037 | 1360 | 91.18 | 31 | 0.19 | 40 | 0.811 | 0.562 | 70 | 83.0 | 110.0 |
| air_time | OO | 29 | 3 | 83.48 | 35 | 6.53 | 5 | 1.834 | 1.922 | 67 | 68.0 | 72.0 |
| air_time | UA | 57782 | 883 | 211.79 | 101 | 0.42 | 178 | 0.402 | 0.053 | 135 | 197.0 | 313.0 |
| air_time | US | 19831 | 705 | 88.57 | 75 | 0.53 | 47 | 2.080 | 3.093 | 42 | 76.0 | 89.0 |
| air_time | VX | 5116 | 46 | 337.00 | 21 | 0.29 | 29 | -0.071 | -0.126 | 323 | 337.0 | 352.0 |
| air_time | WN | 12044 | 231 | 147.82 | 56 | 0.51 | 83 | 1.033 | 0.374 | 110 | 122.0 | 193.0 |
| air_time | YV | 544 | 57 | 65.74 | 20 | 0.85 | 36 | 0.268 | -1.374 | 47 | 56.5 | 83.0 |
| arr_delay | 9E | 17294 | 1166 | 7.38 | 50 | 0.38 | 36 | 2.859 | 12.432 | -21 | -7.0 | 15.0 |
| arr_delay | AA | 31947 | 782 | 0.36 | 43 | 0.24 | 29 | 4.829 | 54.697 | -21 | -9.0 | 8.0 |
| arr_delay | AS | 709 | 5 | -9.93 | 36 | 1.37 | 34 | 2.266 | 7.525 | -32 | -17.0 | 2.0 |
| arr_delay | B6 | 54049 | 586 | 9.46 | 43 | 0.18 | 31 | 2.884 | 12.279 | -14 | -3.0 | 17.0 |
| arr_delay | DL | 47658 | 452 | 1.64 | 44 | 0.20 | 28 | 5.442 | 58.944 | -20 | -8.0 | 8.0 |
| arr_delay | EV | 51108 | 3065 | 15.80 | 50 | 0.22 | 40 | 2.599 | 9.721 | -14 | -1.0 | 26.0 |
| arr_delay | F9 | 681 | 4 | 21.92 | 62 | 2.36 | 40 | 5.157 | 49.213 | -9 | 6.0 | 31.0 |
| arr_delay | FL | 3175 | 85 | 20.12 | 54 | 0.96 | 31 | 3.846 | 21.312 | -7 | 5.0 | 24.0 |
| arr_delay | HA | 342 | 0 | -6.92 | 75 | 4.06 | 30 | 14.624 | 247.792 | -28 | -13.0 | 2.8 |
| arr_delay | MQ | 25037 | 1360 | 10.77 | 43 | 0.27 | 31 | 4.932 | 67.610 | -13 | -1.0 | 18.0 |
| arr_delay | OO | 29 | 3 | 11.93 | 49 | 9.02 | 22 | 1.970 | 3.161 | -16 | -7.0 | 6.0 |
| arr_delay | UA | 57782 | 883 | 3.56 | 41 | 0.17 | 30 | 3.274 | 16.954 | -18 | -6.0 | 12.0 |
| arr_delay | US | 19831 | 705 | 2.13 | 33 | 0.23 | 23 | 3.861 | 25.304 | -15 | -6.0 | 8.0 |
| arr_delay | VX | 5116 | 46 | 1.76 | 50 | 0.70 | 31 | 3.927 | 25.069 | -23 | -9.0 | 8.0 |
| arr_delay | WN | 12044 | 231 | 9.65 | 47 | 0.43 | 30 | 3.235 | 14.342 | -15 | -3.0 | 15.0 |
| arr_delay | YV | 544 | 57 | 15.56 | 53 | 2.27 | 40 | 2.510 | 8.701 | -16 | -2.0 | 24.2 |
| dep_delay | 9E | 17416 | 1044 | 16.73 | 46 | 0.35 | 23 | 3.345 | 15.705 | -6 | -2.0 | 17.0 |
| dep_delay | AA | 32093 | 636 | 8.59 | 37 | 0.21 | 10 | 6.729 | 89.344 | -6 | -3.0 | 4.0 |
| dep_delay | AS | 712 | 2 | 5.80 | 31 | 1.18 | 10 | 3.955 | 17.806 | -7 | -3.0 | 3.0 |
| dep_delay | B6 | 54169 | 466 | 13.02 | 39 | 0.17 | 17 | 3.695 | 18.276 | -5 | -1.0 | 12.0 |
| dep_delay | DL | 47761 | 349 | 9.26 | 40 | 0.18 | 10 | 7.340 | 92.122 | -5 | -2.0 | 5.0 |
| dep_delay | EV | 51356 | 2817 | 19.96 | 47 | 0.21 | 30 | 2.943 | 11.680 | -5 | -1.0 | 25.0 |
| dep_delay | F9 | 682 | 3 | 20.22 | 58 | 2.23 | 22 | 6.371 | 67.873 | -4 | 0.5 | 18.0 |
| dep_delay | FL | 3187 | 73 | 18.73 | 53 | 0.93 | 21 | 4.459 | 27.084 | -4 | 1.0 | 17.0 |
| dep_delay | HA | 342 | 0 | 4.90 | 74 | 4.01 | 6 | 15.940 | 276.543 | -7 | -4.0 | -1.0 |
| dep_delay | MQ | 25163 | 1234 | 10.55 | 39 | 0.25 | 16 | 6.374 | 101.896 | -7 | -3.0 | 9.0 |
| dep_delay | OO | 29 | 3 | 12.59 | 43 | 8.00 | 13 | 2.338 | 4.851 | -9 | -6.0 | 4.0 |
| dep_delay | UA | 57979 | 686 | 12.11 | 36 | 0.15 | 15 | 4.415 | 26.382 | -4 | 0.0 | 11.0 |
| dep_delay | US | 19873 | 663 | 3.78 | 28 | 0.20 | 7 | 5.658 | 46.450 | -7 | -4.0 | 0.0 |
| dep_delay | VX | 5131 | 31 | 12.87 | 45 | 0.63 | 12 | 5.332 | 41.026 | -4 | 0.0 | 8.0 |
| dep_delay | WN | 12083 | 192 | 17.71 | 43 | 0.39 | 19 | 3.965 | 19.869 | -2 | 1.0 | 17.0 |
| dep_delay | YV | 545 | 56 | 19.00 | 49 | 2.11 | 30 | 3.105 | 12.833 | -7 | -2.0 | 23.0 |
Shapiro-Wilk test looks at whether a target distribution is sample form a normal distribution
FlightsNorm <- flights %>%
select(dep_delay, arr_delay, air_time)
formattable(normality(FlightsNorm))
| vars | statistic | p_value | sample |
|---|---|---|---|
| dep_delay | 0.52 | 1.4e-78 | 5000 |
| arr_delay | 0.70 | 4.8e-69 | 5000 |
| air_time | 0.88 | 1.7e-51 | 5000 |
Q-Q plots look at the quartiles of a target data set and plot it against predicted quartiles from a normal distribution
FlightsNorm %>%
plot_normality(dep_delay, arr_delay, air_time)
Looking within carrier at the subgroup normality
# Shapiro-Wilk test
NormCarrierTable <- flights %>%
group_by(carrier) %>%
select(carrier, dep_delay) %>%
normality()
formattable(NormCarrierTable)
| variable | carrier | statistic | p_value | sample |
|---|---|---|---|---|
| dep_delay | 9E | 0.59 | 1.2e-74 | 5000 |
| dep_delay | AA | 0.48 | 8.6e-81 | 5000 |
| dep_delay | AS | 0.51 | 2.3e-40 | 714 |
| dep_delay | B6 | 0.56 | 4.9e-77 | 5000 |
| dep_delay | DL | 0.37 | 2.8e-85 | 5000 |
| dep_delay | EV | 0.65 | 1.7e-71 | 5000 |
| dep_delay | F9 | 0.47 | 1.1e-40 | 685 |
| dep_delay | FL | 0.52 | 5.8e-69 | 3260 |
| dep_delay | HA | 0.12 | 7.1e-37 | 342 |
| dep_delay | MQ | 0.51 | 9.5e-79 | 5000 |
| dep_delay | OO | 0.60 | 1.2e-07 | 32 |
| dep_delay | UA | 0.52 | 5.8e-79 | 5000 |
| dep_delay | US | 0.46 | 3.8e-81 | 5000 |
| dep_delay | VX | 0.43 | 8.0e-83 | 5000 |
| dep_delay | WN | 0.53 | 2.8e-78 | 5000 |
| dep_delay | YV | 0.62 | 7.0e-33 | 601 |
# Q-Q plot
flights %>%
group_by(carrier) %>%
select(carrier, dep_delay) %>%
plot_normality()
# Remove the '#' below to reproduce an HTML from an R script.
#eda_web_report(flights)